STAT 313
What are the two data types R stores categorical variables as?
dplyr – a tool bag for data wranglingfilter()
select()
mutate()
summarize()
arrange()
group_by()
The Pipe %>%
If you wanted means for each level of a categorical variable, what would you do?
The HJ Andrews Experimental Forest houses one of the larges long-term ecological research stations, specifically researching cutthroat trout and salamanders in clear cut or old growth sections of Mack Creek.
# A tibble: 2 × 2
section mean_length
<chr> <dbl>
1 clear cut forest 85.3
2 upstream old growth coniferous forest 81.4
Why na.rm = TRUE?
The channels of the Mack Creek which were sampled were classified into the following groups:
"C"
"I"
"IP"
"P"
"R"
"S"
"SC"
NA
cascade
riffle
isolated pool
pool
rapid
step (small falls)
side channel
not sampled by unit
filter()-ing Specific Channel TypesThe majority of the Cutthroat trout were captured in cascades (C), pools (P), and side channels (SC). Suppose we want to only retain these levels of the unittype variable.
%in%
If you filter includes more than one value you must use %in% not ==!
Categorical Variables for Whom?
Suppose Cal Poly is interested in summarizing the demographics of their undergraduate students. They have designed the following question asking about student’s gender identity:
What is your gender identity?
Male, Female, Other
Who benefits from these options?
Who suffers from these options?
Data Feminism
Data science by whom?
Data science for whom?
Data sets about whom?
Data science with whose values?
Rethink binaries
How would you redesign the survey question about student’s gender identity?
Challenge power
An aura objectivity
“We focus on four conventions which imbue visualizations with a sense of objectivity, transparency and facticity. These include: (a) two-dimensional viewpoints, (b) clean layouts, (c) geometric shapes and lines, (d) the inclusion of data sources.”
The work that visualization communications do
Elevate emotion